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UNSUPBRVISED DATA SEGMENTATION 



The present invention relates to a method and apparatus for unsupervised data 
segmentation which is suitable for assigning multi-dimensional data points of a data 
set amongst a plurality of classes. The invention is particularly appHcable to 
automated image segmentation, for instance in the fi&ld of medical inaaging, tiius 
aUowing different parts of imaged objects to be recognised and demarcated 
automatically. 

In the field of automated data processing it is useful to be able to recognise 
automatically different groups of data points within the data set. This is known as 
segmentation and it involves assigning tiie data points in the data set to different 
groups or classes. 

An example of a field in which segmentation is useful is tiie field of image 
processing. A typical imaged scene contains one or more objects and background, 
and it would be useM to be able to recognise reUably and automatically the different 
parts of the scene. Typically this may be done by segmenting the image on the basis 
of the different intensities or colours appearing in the image. Image segmentation is 
applicable in a wide variety of imaging appUcations such as security inonitoiing, 
photo interpretation, examination of industiial parts or assemblies, and medical 
imagmg. In medical imaging, for instance, it is useful to be able to distinguish 
different types of tissue or organs or to distinguish abnormalities such as an 
aneurysm or tumour firom normal tissue. Currentiy, particularly in medical imagmg, 
segmentation involves considerable input firom a cUnician in an interactive method. 

For example, there have been proposals for methods of demarcating an 
aneurysm in an nnage of vasculature by first identifying the aneurysm neck, then ■ 
labelling aU pixels on one side of the neck as forming the aneurysm, while pixels on 
the otiier side are identified as part of tiie adjoining vessel. Such techniques are 
described in R. van der Weide, K. Zuiderveld, W. MaU and M. Viergever, "CTA- 
based angle selection for diagnostic and interventional angiography of saccular 
intiacranial aneurysms", IEEE Transactions on Medical Imaging, Vol. 17, No. 5, 
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pp831-341, 1998 and D. Wilson, D. Royston, J. Noble and J. Byrne, "Determining 
X-ray projections for coil treatments of intracranial aneurysms", IEEE Transactions 
on Medical Imaging, Vol. 18, No. 10, pp973-980, 1999. However, these techniques 
also rely on manual intervention for starting the segmentation. 
5 Techniques of segmentation using region-splitting or region growing are well 

known, see for example: Rolf Adams and Leanne Bischof, "Seeded Region 
Growing", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 
16, No. 6, pp641-647, Jun, 1994. However, these techniques require that tihe number 
of regions into which the data set is to be segmented is known in advance. Thus the 
10 techniques are not generally applicable to fully automatic metibods. 

Segmentation techniques in which there is no initial assumption of the 
number of classes foxmd in the data set are referred to as "unsupervised" 
segmentation techniques. An imsupervised segmentation algorithm has been 
proposed in Charles Kervrann and Fabrise Heitz, "A Markov Random Field model- 
1 5 based approach to unsupervised texture segmentation using local and global spatial 
statistics". Technical Report No. 2062, INRIA, Oct, 1993. This utilises an 
, augmented Markov Random Field, where an extra class label is defined for new 
regions, and a parameter is pre-set to define the probability assigned to this extra 
state. Any points in the data set which are modelled sufficiently badly (assigned a 
20 low probabihty by the existing classes) will be assigned to this new class. At each 
iteration of the algorithm, connected components of such points are collated into new 
classes. 

However, typical problems with unsupervised techniques are under- 
segrnentation (in which data points are added to inappropriate classes) and over- 
25 segmentation (in which the data is divided into too many classes). 

One aspect of the present invention provides an unsupervised segmentation, 
method which is generally apphcable to multi-dimensional data sets. Thus, it allows 
for completely automatic segmentation of the data points into a plurahty of classes, 
without any prior knowledge of the number of classes involved. 
30 In more detail this aspect of the invention provides an imsupervised 
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segmentation method for assigning multi-dimensional data points of a selected data 
set amongst a pluraHty of classes, the method comprising the steps of: 

(a) defining an initial class encompassing all data points of the selected data^ 

set; 

5 (b) defining a second class by selecting a data point and assigning it to the 

second class together with data points within a first predetermined 
neighboxirhood of the selected data point; 

(c) testing each data point lying within a second predetermined 
neighbourhood of data points in the second class by calculating the 

10 probability that each said data point belongs to the first class and the 

probabiHty that it belongs to the second class, and assigning it to the second 
class if the probabiHty that it belongs to the second class is higher; 

(d) said probability calculations being adapted during said method in 
dependence upon tiie assignment of the points to the classes. 

1 5 The probability calculations may comprise the steps of determining a 

probabiUty distribution of a property of the data points in the initial class and 
determining a probabiHty distribution of said property of the data points in the 
second class, and comparing the data point under test with the two probabiUty 
distributions. The probabiUty calculations may also comprise the step of multiplying 

20 the probabiUty derived firom the probabiUty distribution with an a priori probabiUty 
• derived, for example, firom the proportion of points in the neighbourhood in the 
various classes. 

The calculation of probabiUty may be adapted as the method proceeds by 
recalculating the probabiUty distributions as data points are assigned to the classes. 
25 The distributions wiU alter as the number of data points in the data points varies. This 
adaptation may take place every time a point is reassigned, or afler.a few points have 
h&SD. reassigned. 

The classes continue to grow as more data points are assigned to them. 
Preferably the method contimues until no more data points are added to the class, at 
30 which point another class may be defined and then grown by repeating the method 



steps. 

The selection of the data point for initiating a class may be random, or it may 
be optimised, for example by ordering the remaining points based on tiie probability 
distribution. 

5 Preferably classes are discarded (or "culled") if they fail to grow, i.e. if they 

fail to have data points assigned to them when all necessary points have been tested. 
This is particularly useful in avoiding over-segmentation of the data set. 
Segmentation is concluded when all of the classes formed in turn on the basis of the 
data points remaining in the initial class have been discarded. 
10 A predetermined neighbourhood of a data point d is an open set that contains 

at least the data point itself. One example is the open ball of radius r which contains 
all data points within a distance r of the data point d, though other shapes are possible 
and may be appropriate for different situations. In extreme cases, a neighbourhood 
may contain only the data point itseh^ or may contain the entire data set. The first 
15 and second predetermined neighbourhoods may be defined only on the spatial 
position of the data points, for instance in the application of the technique to an 
image where the aim is to segment the image into the different parts of the imaged 
object. However, in other data sets the nei^bourhoods may be defined in a 
parameter space containing the data points. 

20 Where the technique is applied to image segmentation, the data points may 

comprise a descriptor of at least a part of an object in the image and the spatial 
coordinates of that part. The descriptor may be representative of tiie shape, size, 
intensity (brightness), colour or any other detected property, of that part of the object. 
Rather than taking the data points from the imlge itself, they may be taken 

25 from a spatial model fitted to the image, such as a 3-D mesh fitted to flie image or its 
segmentation. This is partic\alarly useful where the descriptor is a descriptor of the • 
shape of the object. 

The image may be a volumetric image or a non-invasive image, and for 
example may be an image in the medical field or industrial field (e.g. a part x-ray). 

3 0 Another aspect of the invention provides a method of demarcating different 



parts of a structure in a representation of the structure, comprising the steps of 
calculating for each of a pluraUty of data points in the representation at least one 
shape descriptor of the structure at that point, and segmenting the representation on 
the basis of said at least one shape descriptor. 
5 The representation may be an image of the structure, or may be a 3-D model 

- of the structure (which could be derived by various imaging modahties). The results 
may be displayed in the form of a visual representation of the structure, with the parts 
■ distinguished, for instance by being shown in different colours.. 

The descriptor may comprise values representing cross-sectional size or shape 
10 of the structure at that point. The values may be lateral dimensions of the structure at 
that point. 

The descriptors may be used to segment the representation automatically, for 
exaiiq)le using an unsupervised segmentation method such as the method in 
accordance with the fibrst aspect of the invention. 
1 5 The image may be a volumetric image or a non-invasive image, and for 

" example may be an image in the medical field or industrial field (e.g. a part x-ray). In 
the medical field the method may be used to demarcate an aneurysm firom 
vasculature, or to demarcate other protrusions. 

The invention extends to a computer program comprising program code 
20 means for executing the methods on a suitably programmed computer. Further, the 
invention extends to a system and apparatus for processing and displaying data . 

utiUsing the methods. 

The invention will be further described by way of example, with reference to 

the accompanying drawings in which:- 
25 Figure 1 illustrates schematically an imaging system in accordance with one 

embodiment of the invention; 

Figure 2 is a flow diagram of one embodiment of the invention; 
Figures 3 A and 3B show respectively a 3-D model of an aneurysm and 
adjoiaing vessels and a mesh computed for the 3-D model; 
30 Figure 4 iUustrates schematically a blood vessel and aneurysm indicating the 



shape descriptors used in an embodiment of the present invention; 

Figure 5 illustrates the concepts of data point classes and regions used in one 
embodiment of the present invention; 

Figure 6 illustrates a synthetic data set containing three groups of data points; 
5 Figure 7 illustrates an initial probability distribution for the data set of Figure 

6; 

Figures 8A and 8B illustrate respectively a newly seeded class in the data set 
of Figure 6 aad the initial probability distribution for that class; 

Figure 9 illustrates the classiJBcation after the class of Figure 8 has converged; 
10 Figure 10 illustrates the classification after a ftuther class has converged; 

Figures 1 1 A, B and C illustrate probabiUty densities for the classes in Figure 

10; 

Figures 12 A and B illustrate the seeding of a fiuiher class and its initial 
probability distribution; 
15 Figure 13 illustrates the final segmentation of the data set of Figure 6 

achieved with one embodiment of the present invention, and 

Figures 14 and 15 illustrate the results of applying the image segmentation 
method of an embodiment of the invention to medical images. 

An embodiment of the invention apphed to the shape based segmentation of . 
20 an image of vasculature including an aneurysm and to the intensity based 

segmentation of a synthetic image will be described below. However, it will be 
appreciated that the segmentation technique is applicable to the segmentation of 
general data sets having data points in n-dimensions, where each data point has m 
numeric values. Thus it may be applied, for example, to intensity-based 
25 segmentation, for instance of ultrasound, MRI, CTA, 3-D angiography or 

colour/power Doppler data sets, to the segmentation of PC-MRA data where a scan 
provides information on the speed (intensity) and an estimated flow direction, and to 
unsupervised texture segmentation as weU as object segmentation of parts based on 
geometry. 

30 Figure 1 illustrates schematically the apparatus used in one embodiment of 



the invention which comprises an image acquisition device 1, a data processor 3 and 
an image display 5. The operation of the apparatus is iUustrated schematicaHy by the 
flow diagram of Figure 2 and involves the general steps acquiring the image in step 
si, segmenting the image in step s3 and displaying the segmented image in step s4. 
5 In this embodimeat of the invention the segmentation is carried out on a three 
dimensional model of the imaged object (in the example below of vasculature) 
calculated in step s2 by standard techniques such as A.C.S Chung and J .A. Noble, 
"Fusing magnitude and phase information for vascular se^entation in phase 
contrast MR angiograms", Proceedings Medical Image Computing and Computer 
10 Assisted Intervention. (MICCAI), pp. 166-175, 2900 and D.L. Wilson and J.A. 

Noble, "An Adaptive Segmentation Algorithm for Time-of-FUght MRA Data", IEEE 
Transactions onMedicallmaging, Vol. 18, No. 10, pp 938-945, Oct. 1999,rEEB. 

A brain aneurysm is a localised persistent dilation of the wall of a blood 
vessel. Visually, it appears that part of tihe vessel has baUooned out When the 
15 ballooning vessel pops, it will often result in the death of the patient. There are 

several possible treatments for an aneurysm including surgery (cHpping) or filling the 
aneurysm with coils. The type of treatment is dependent upon factors sucb as 
aneurysm volume, neck size and the location of the aneurysm in the brain. 

It is usual to image the aneurysm and related blood vessels using a 3-D 
20 imaging modality such as MRA, CTA or 3-D Angiography. Such scans may be 

segmented to extract blood vessels and aneurysm from tissue and air. The segmented 
data can then be used to produce a 3-D model of the vessels and aneurysm. Given 
such a 3-D model, it is useM to demarcate the aneurysm, identifying where it 
connects to the major vessel. This aUows the estimation of aneurysm volume and 
25 neck size and other geometry-related parameters, and hence aids the clinician to 
choose the appropriate treatment for a particular patient and possibly to use the 
information in the actual treatment (eg to select views of the aneurysm). In this 
embodiment the aneurysm is demarcated by first computing a triangular mesh over 
the 3-D model. Such a mesh can be computed using an estabUshed mesh method 
30 such as the marching cubes algoritibm. An example of a 3-D model showing an 
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aneurysm and the adjoining vessels, and its associated mesh is illustrated in Figures 

3 A and B. The aneurysm is the large ballooiung section near the centre of the image. 

The segmentation will be carried out in this embodiment using two values 
i 

which form a shape descriptor, i.e. a description of the shape of the vasculature at 
that point. At each yertex in the triangular mesh, a local description of the vessel 
shape is computed, as shown in Figure 4. Taking the unit surface normal to the 
mesh at a particular vertex v^, a ray is extended from v, into the vessel and the 



distance to the opposite side of the vessel is measured. Halving this value gives an 
estimate of the vessel radius at v,-. This estimate of vessel radius is the jSrst of two 
10 descriptors that are computed. 

Using Tf, the point pj is defined as an estimate of the vessel centre, defined as 

The two directions of principal curvature on the mesh, that is the directions in 
which the curvature of titie mesh at are a maximtmi and minimum can then be 

15 estimated. Denoting these directions as and c^i„, where the absolute value of 
is larger than the absolute value of c^^,, a vector firom Pi in the directions of c„ax and 
--'Cnuix is extended, measuring the distance in each direction to the vessel surface. 
AddiQg these two distances together gives an estimate of the vessel diameter 4- in a 
direction perpendicular to 7z,.. 

20 The two values (r^, df) form the shape descriptor which characterises the 

vessel at the point and are computed for every vertex in the mesh 

The task is now to segment the data set to demarcate the aneurysm, i.e. to 
group together points that lie on the aneurysm and to distinguish these firom points on 
the adjoining vessels. This will allow the aneurysm to be demarcated. Poiats lying 

25 along the single blood vessel will have similar values for (r^, d,). At the neck of the 
aneurysm, these values will change rapidly. Passing over the neck and onto the 
aneurysm itself, there will be a similarity in the values on the aneiuysm. 

Segmentation is achieved in this embodiment by using a region spUtting 
algorithm. The algorithm separates the points on the triangular mesh into regions 

30 (sub-parts) that are similar. Each vessel should be identifieid as a sub-part, while the 
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aneurysm will form a different sub-part. 

Firstly, to illustrate the concepts used in the segmentation method it will be 
helpful to consider the sunple set of points illustrated in Figure 5. Suppose the task is 
to classify data point dg. It is assxuned that it must be in the same class as one of the 
5 other five data points that lie within the dotted circular neighbqurhood, i.e. within a 
distance r^/a,,,^ of the data point tmder consideration. Of these, as indicated in Figure 
5, dj and rf^ belong to class Q; d^ ajad rf^ belong to class C^; and ds belongs to class 
Cj. The point dg will be classified depending upon some property which it holds ia 
common with the data points in one of the other classes. This property may, for 
10 example, be its intensity or colour if the points are pixels in an image, or a shape 
descriptor such as that described above in connection with the task of aneurysm 
demarcation, and can be a scalar or n-vector quantity. The approach in this 
embodiment is to calculate the probabilities in turn that the point do is in each of the 
classes Q, Q or and then to assign it to the class for which the probabiUty is the 
15 highest In this embodiment the probability will be the product of two terms. The 
first is a probability that is independent of the property of interest of do. The second 
is a probability based on the value of the property (for example intensity or shape 
descriptor) of the point and a comparison with the distribution of such values in each 
of the three classes. * 

20 Taking the first of those probabilities; there are several ways of calculating 

this probabiUty. One way is to set it as being directly proportional to the number of 
data points of each class within the radius r^,,;^;. For example, referring to Fig. 5, . 
this probability term as regards class Q would be 2/5 because 2 of the 5 points 
within the distance r^i^^i^ are points of class Cy. There are other possibilities, such as 

25 setting the probability in accordance with the Euclidean distance in real or parameter 
space between the various points. This term, which does not depend on the value of 
the property of interest at the data point, is known as the "a priori probability. 

The second term, based on the value of the property of interest of point do 
(such as intensity or shape descriptor) is, in this embodiment, obtained by comparing 

30 the value of the property for dg to the distribution of such values in the three classes 
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Cy, Cj, Cj. This wiU be described below with reference to a specific iatensity-based 
example illustrated in Figure 6. Figure 6 illustrates a data set which consists of 
intensity values. The aim is to segment this image automatically into the three 
regions or classes which are clearly visible. The first step is to assign all data points 

5 (in this case pixels) to a single initial class Co- Then the probabiHty distribution (in 
this case of intensity on a gray scale) over the class Co is calculated. Bi this case it is 
calculated by computing a histogram of the values of intensity (i.e. biiming the 
intensity values, counting the number of values, within each bin, and normaHsing the 
total count to 1). The histogram is then smoothed using Parzen windows by 

10 convolving tiie values in the histogram using a kernel ftmction. The kernel fimction 
used in this embodiment is the Gaussian function, although others may be used. This 
smoothing fimction is adaptive as will be explained below. The result is the initial 
probabiUty distribution as illustrated in Figure 7. Incidently, in Figure 7 three peaks 
corresponding to the three classes of Figure 6 can be seen. 

1 5 The next step is to start or "seed" a new class. This is achieved by choosing a 

data point, defining a neighbourhood of radius around it, and assigning all points 
within the nei^bourhood to the new class C;. This is illustrated in Figure 8A. In 
some embodiments the point may be chosen randomly, although in other 
embodiments the points in the data set may be ordered for selection, for instance in 

20 accordance with how badly they are modelled by the remaining class. It can be seen 
that the new class C/happens to be in the bottom left-hand area of the image. Thea 
the probabihty distribution of intensity values is calculated for the class Q in just the 
same way as the probabiUty distibution above (namely by forming a histogram and 
then smoothing it). This probability distribution is illustirated in Figure 8B. 

25 It was mentioned above that the smoothing is adaptive. In this embodimeaat 

tiiis is achieved by making the variance of the Gaussian kernel fimction dependent 
upon the number of data points in the class. This greatiy affects tiie probabihty 
distibution produced. When the histogram comprises only a smaU number of 
values, it is ^ropriate to use a large variance. This results in heavy smoothing. If 

30 the histogram consists of a large number of values, it is more likely that the 
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probability distribution accurately reflects the underlying distribution, and so a small 
variance is appropriate, resultiag in less smoothing. The variance may be defined as 
a function of the number of data points in a class, such that as the number of data 
points in the class increases, the variance decreases. In this example, the variance is 
5 inversely proportional to the square of an afiSne function of the size of the class. 
Other functions are possible. For example, the variance may be inversely 
proportional to the natural logarithm of the number of data points in the class. 

Note that functions other than a Gaussian can be used as the kemel function 
for the Parzen window estimate of the probability distribution. In this case, some 

10 property of the kemel function comparable to the Gaussian's variance will be 
adjusted as a class grows or shrinks. 

The next step is to test data points near the class C/ to check whether they can 
be assigned to class Cj not. In this embodiment all points dj are tested which lie 
within a radius rdass^ of any point in the class Cj. The testing involves selecting a. 

15 point dj and computing the probabilities that this point belongs to class Cq or Q. For 
each class, this involves computing two values, which are multiplied together to 
compute the probabiUty. 

The first value is the a priori probability that dj belongs to each class. As 
haeiitioned above this probability is independent of the value of the property of 

20 interest. In this example it is taken as the proportion of points within a radius r^^gj of 
dj that are in the relevant class, as explained in relation to Figure 5. 

The second value is computed by comparing the value of the property of 
interest (intensity or shape descriptor etc) with the probabihty distributions computed 
for the class. For classes Q and Cy flaese probability distributions are shown in 

25 Figure 7 and 8B. Thus, for example, if the point dj has an intensity corresponding to 
the value 20 on the horizontal axis of the distribution, the value for class Q can be 
read off as 0.010 whereas the value for class Cj can be read off as about 0.027. 
These values are multiplied with.the a priori probabilities to give ttie probability that 
data point dj belongs to either class Cq or Q. In the example of the two values that 

30 we have quoted, where dj has an intensity of 20, if tihte a priori probabilities are of a 
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similar magnitude, then class C, will have a higher probabiHty and the data point will 

be assigned to class C;. 

Thus the class Q grows with each point that is assigned to it The testing is 
repeated recursively, choosing all points within a radius r^czify of each point added to 

5 class Q and testing whether Ihey should be reclassified to class C; . It should be 
noted that only points wHch are currently in class Q are considered (in other words 
reclassified points are not subsequently reconsidered). It is important to note, though, 
that e4ch time a point is reassigned, the probability distributions for the two classes 
are recalculated with a new variance for the Gaussian kernel set in accordance with 

10 the change in the number of points. Where there are a large number of data points 
such that the probabiHty distribution does not vary much as a single point is 
reassigned, iJie recalculation of the probabiHty. distribution need not occur every time 
a point is reassi^ed, but after a preset number of points have been reassigned. This 
means that the probabiHty distribution varies adaptively as the classification process 

15 proceeds. 

The variance used, therefore, when computing the probabiHty that a point 
under test belongs to the initial class Co will increase as points are removed firom the 
class, and the variance used to compute the probability that the point belongs to class 
C, will decrease as that class grows. In this way. Q will improve its model of the 

20 distribution of numeric values for the property of interest in the class, and this 
distribution will be removed graduaUy fixim the three distributions that together 
formed the distribution for class Co illustrated in Figure 7. 

The process of testing points for addition to class Q is continued until no 
new points within a radius r^^^^fy of the existing pomts in the class are added. This is 

25 the situation indicated in Figure 9. If viewed graphically, the class Q appears to 
"flood-fiU" out to the borders of the class as shoA?ra in Figure 9. 

Then the process is repeated by seeding a new class Q on a point in class Co 
and growing that class. Whilst growing the class Q, when testing whether to 
reassign some point dj from class Q to class Q, it may be found that points from 

30 class Ci also He within a neighbourhood of radius r^^^ of d,. In this case, it is tested 
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whether to assign data point dj to class Cq, Cj or Cj. 

After tbis second class Qhas converged, the data will be classified into Co, 
Cj and Cj as shown in Figure 10. Figure 1 1 shows the probabihty distributions for 
the fbree classes. 

5 Because this is an unsupervised algorithm, the process does not, of co\irse, 

"know" that there are no more classes of points. Therefore the process will continue 
by seeding a new class Cj as shown in Figure 12A. The initial probability 
distribution for class Cj is shown in Figure 12B. However, this class will, in feet, not 
grow in the way that Cj and d did. The algorithm is designed to discard classes 

10 which do not grow (by reclassifying their points back to class Co). The reason that 
class Cj does not grow will be ejqplained. First, because Cj contains fewer points 
than Co, the probabihty distribution is generated by convolving with a Gaussian 
kernel function with a large variance. Thus it is more smoothed than the probabihty 
distribution for the remaining points in Q. This results m lower probabihties being 

1 5 read off for values firom the underlying distribution. It will be seen that in Figure 
12B the maximum probabihty is 0.045, while the maximum for the remaining class 
Co is 0.06 as shown m Figure 1 1 A. Thus as class C3 attempts to grow, by testing data 
points, most pomts will not be re-classified from Co to C3, but wiU remain instead in 
Co. If the class does not grow sufficiently it will be "culled". The growth is tested 

20 agamst a threshold. In this example if, at convergence, a class is less than three times 
as large as when it was seeded it is culled. Other criteria, for example based on the 
rate of growth, are possible. In this way the algorithm does not introduce an 
excessive number of classes to the segmentation. 

In practice the algorithm continues to attempt to seed new classes on each of 

25 the points left in Co, but each new class wiU be culled. The final segmentation is 
shown in Figure 13. It can be seaa that the segmentation is fairly accurate. 

It should be noted that the algorithm can be apphed again within each of the 
classes Q, Civ Q to check for segmentation within those classes. Thus each class is 
taken in turn, aU its data points regarded as an initial class and a new class seeded 

3 0 within it, the method then proceeding as before. 



-14- 

The data set need not comprise all data points available (e.g. all pixels in the 
image or all points in the model). A subset of the data points may be selected to 
optimise the segmentation (e.g. by excludiag obvious outliers). In addition, not all 
data points in a class may be used in the computation of the probability distribution. 
5 A subset of the data points may be selected (e.g. by excluding outliers according to 
some statistical test). 

The algorithm therefore involves segmenting a data set by initially assigning 
all points to a single class and then randomly seeding and growing new classes. The 
probabiUty distributions in the classes are adaptive and this, together with the culling 
10 of classes which do not grow, means that over-segmentation is avoided. 

In applying the algorithm to the problem of demarcation of an aneurysm, 
instead of intensity values, the two-dimensional shape descriptor is used. Thus, 
referring to Figure 3, the 3-D model of the aneurysm and blood vessels is calculated 
from an image of the vasculature and a triangular mesh is deJBned over the model. At 
1 5 each point on the mesh the two-dimensional data points (r^, d^) are computed which 
describe the shape of the vessel or aneurysm at that point. The algorithm is then 
applied by initially assigning all porats to the same region, and then seeding a new 
region somewhere on the mesh. The method attempts to grow this new region. If it 
does not grow, it is culled. At completion, the mesh is separated into the appropriate 
20 regions, with the aneurysm separated from its adjoining vessels on the basis of its 
shape descriptor. 

Figures 14 and 15 show the application of an embodiment of the invention to 
two clinical data sets. The results for two patients with aneurysms are shown and in 
each case the three views of the 3-D brain model are shown on the left, and the 
25 segmented results on the right. In each case the aneurysm present is successfully 
identified. 

The method can, of course, be apphed also to intensity-based segmentation, 
such as the segmentation of B-mode ultrasound follicle images where it has 
successfully demarcated regions indicating follicles. The method is also applicable 
30 to the segmentation of MRI, CTA, 3-D angiography and colour/power Doppler sets 
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where blood can be distinguislied firom other tissue type by its intensity. 
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CLAIMS 



1 . An unsupervised segmentation method for assigning muiti-dimensional data 
points of a selected data set amongst a plurality of classes, flie method comprising the 

5 steps of: 

(a) defining an initial class CTicompassing all data points of the selected data 
set; 

(b) defining a second class by selecting a data point and assigning it to the 
second class together with data points within a first predetermined 

1 0 neighbourhood of the selected data point; 

(c) testing each data point lying within a second predetermined 
neighbourhood of data points in the second class by calculating the 
probability that each said data point belongs to the first class and the 
probabiHty that it belongs to the second class, and assigning it to the second 

15 class if the probabiUly that it belongs to the second class is higher; and 

(d) said probability calcidations being adapted during said method in 
dependence upon the assignment of the points to the classes. 

2. A method according to claim 1 wherein the probabiHty calculations comprise the 
20 steps of deterroining a probabiUty diistribution of a property of the data points in the 

initial class and deterinining a probability distribution of said property of the data 
points in the second class and comparing the data point under test with said 
probability distributions. 

25 3. A method according to claim 1 or 2 wherein said calculation is adapted by 
recalculating said probability distributions as data points are assigned to classes. 

4. A method according to claim 3 wherein said probability distributions are 
recalculated on the basis of the number of data points in each class. 
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5 . A method according to claim 4 wherein said probability distributions are 
recalculated after each assignment of a data point. 

6. A method according to any one of the preceding claims wherein steps (b), (c) and 
5 (d) are repeated iteratively testmg in step (c) data points lying within the second 

predetermined neighbourhood of data points assigned to the second class. 

7. A method according to claim 6 wherein steps (b) to (d) are repeated iteratively 
until no more data points are added to the second class. 

10 . 

8. A method according to any one of the preceding claims further comprising the 
step of defining a third class by selecting a data point firom the initial class and 
assigning it to the third class together with data points within the first predetermined 
neighbourhood of the selected data point, and repeating the method iteratively with 

15 respect to the third class. 

9. A method according to any one of the preceding claims further comprisiag the 
step of discarding any class which fails to have sufficient data points assigned to it in 
step (c) according to a predetermined aiterion, by reassigning its data points to the 

20 initial class, when all data points within said predetermined neighbourhood have 
been tested. 

10. A method according to claim 9 further comprising the step of concluding the 
segmentation when all classes formed in turn on the basis of selecting each of the 

25 data points remaining in the initial class have berai discarded. 



11. A method according to any one of the preceding claims wherein said first and 
second predetermined neighbourhoods are open spheres centred on the data pojnt i 
having a predetermined radius. 
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12. A method according to any one of the preceding claims wherein said first and 
second predetermined neighbourhoods are defined on a parameter space containing 
the data points. 

5 13, A method according to any one of the preceding claims wherein said data points- 
are derived jfrom an image, said classes corresponding to different physical parts in 
said image. 

14. A method according to claim 13 wherein said property of said data points 
10 comprises a descriptor of at least part of an object in the image and the spatial 

coordinates of that part. 

15. A method according to claim 14 wherein the descriptor comprises at least a 
value representing the shape of at least part of said object. 

15 

16. A method accordtag to claim 15 wherein the desoiptor comprises at least a 
value representing the size of at least part of said object. 

17. A method according to any one of claims 13 to 16 wherein the image is a 
20 medical image. 

18. A method according to any one of claims 13 to 16 wherein the image is a 
volumetric image or non-invasive image. 

25 19. A method according to any one of claims 13 to 18 wherein the data poiats are 
taken firom a spatial model fitted to said image. 

20. A method of demarcating different parts of a structure in a rqpresentadon of the 
structure, comprising the steps of calculating for each of a plurality of data points in 
30 the representation at least one shape descriptor of the structure at that point, and 
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segnienting the representation on the basis of said at least one shape descriptor. 



21. A metitiod according to claim 20 wherein the descriptor comprises values 
representing the cross-sectional size of the structure at that point. 

5 ' ' 

22. A method according to claim 21 wherein the values representing the cross- 
sectional size are the lateral dimensions of the structure at that point. 

23. A method according to any one of claims 20 to 22 wherein the representation is a 
10 spatial model fitted to an image of the structure. 

24. A method according to any one of claims 20 to 23 wherein the representation is 
segmented automatically. 

15 25. A method according to claim 24 wherein the representation is segmented using * 
an unsupervised segmentation method. 

26. A method according to any one of claims 20 to 23 wherein the representation is 
segmented by hand. 

20 

27. A method according to any one of claims 20 to 26 wherein the structure is in the 
human or animal body. 

28. A method according to any one of claims 20 to 26 wherein the representation is a 
25 medical image. 

29. A method according to any one of claims 20 to 26 wherein the image is a 
volumetric or non-invasive image. 

30 30. A method according to any one of claims 20 to 29 wherein the representation is a 
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model of the structure. 

31. A method according to any one of claims 20 to 30 wherein the segmentation 
method is in accordance with any one of claims 1 to 19. 

5 

32. A computer program comprising program code means for executing on a 
programmed computer the method of any one of the preceding claims. 

33. Apparatus for segmenting a data set of multi-dimensioned data points, the 
1 0 apparatus comprising: 

means for receiving the data set; 

a data processor for segmenting the data set in accordance with the method of 
any one of claims 1 to 19; and 

a display device for displaying the segmented data set 

15 

34. Apparatus according to claim 33 wherein the means for receiving the data set 
comprises an acquisition device for acquiring the data set from a subject, 

35. Apparatus for demarcating different parts of a structure in a representation of the 
20 structure, the apparatus comprising: 

meaas for receiving said representation in the form of a data set; 
a data processor for processing said data set to demarcate the different parts 
of the structure in accordance with the method of any one of claims 20 to 3 L 
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ABSTRACT 
UNSUPERVISED DATA SEGMENTATION 



An unsupervised method of segmenting data sets using a region growing . 
5 technique in which data points are initially assigned to a single class, new classes are 
seeded and points in the data set tested by calculating the probability that they belong 
to the new class. The probability distributions used in the calculation are adapted as 
points are reassigned. Classes which fail to grow are discarded. The technique may 
be applied to the segmentation of data sets in which the data points are taken from 

10 medical images. The method may be applied to the demarcation of different parts of 
structures, e.g. in the medical field demarcating an aneurysm firom the surrounding 
blood vessels in an image or 3-D model of a patient's vasculature. The method may 
involve using a shape descriptor which is representative of the shape of the structure 
at each point under consideration. Thus the different parts are distinguished on the 

15 basis of their shape. 
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Figure 1: Schematic of imaging system. 
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Figure 2: Flow diagram of one embodiment of the mvention. 



a. b. 
Figure 3: a.) 3-D model of an aneurysm and adjoining vessels* b-) Mesh computed for the 3-D model. 




Figure 4: Local shape descriptors, vessel radius and the perpendicular diameter. 




Figure 5: Point and neighbourhood. 




Figure 6: Synthetic data containing tiu-ee groups. 
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Figure 7: Initial Probability P( vj I dj e Co). 
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Figure 8: a.) Seed for Class Q. b.) Initial probabiUty for P(vj I dj e d). 



Seed for class Cj 




^ b. 
Figure 12: a.) Seed for Class C3. b.) Initial probability for P( vjldje C3). 




Figure 13: Final s^entation of ^nthetic data. 
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